[SPARK-12934][SQL] Count-min sketch serialization#10893
[SPARK-12934][SQL] Count-min sketch serialization#10893liancheng wants to merge 5 commits intoapache:masterfrom
Conversation
692f93e to
e97d7f9
Compare
|
Test build #49967 has finished for PR 10893 at commit
|
There was a problem hiding this comment.
let's put the implementation in CountMinSketchImpl, and simply delegate to a static method there
There was a problem hiding this comment.
also somewhere in CountMinSketchImpl we should document the serialization format.
There was a problem hiding this comment.
Can we move this out so that we can also use it in bloom filter?
There was a problem hiding this comment.
This should be count min sketch specific, because the two binary protocols can and should evolve separately.
5e0c225 to
4abc4e0
Compare
|
Test build #50023 has finished for PR 10893 at commit
|
|
Test build #50025 has finished for PR 10893 at commit
|
|
Test build #50020 has finished for PR 10893 at commit
|
There was a problem hiding this comment.
should we move this comment near writeTo?
There was a problem hiding this comment.
Actually I couldn't decide whether to put it before writeTo or readFrom, and then ended up here...
|
LGTM overall |
There was a problem hiding this comment.
should add some check here to throw an exception if the version number is not 1.
|
Going to merge this. Thanks. Please address my (one) feedback in your next pr. |
There was a problem hiding this comment.
This should be closed before returning from the method, right ?
There was a problem hiding this comment.
But DataInputStream.close() also closes the underlying input stream, which isn't our expected behavior, is it?
This PR adds serialization support for
CountMinSketch.A version number is added to version the serialized binary format.